From Comparing Clusterings to Combining Clusterings

نویسندگان

  • Zhiwu Lu
  • Yuxin Peng
  • Jianguo Xiao
چکیده

This paper presents a fast simulated annealing framework for combining multiple clusterings (i.e. clustering ensemble) based on some measures of agreement between partitions, which are originally used to compare two clusterings (the obtained clustering vs. a ground truth clustering) for the evaluation of a clustering algorithm. Though we can follow a greedy strategy to optimize these measures as objective functions of clustering ensemble, some local optima may be obtained and simultaneously the computational cost is too large. To avoid the local optima, we then consider a simulated annealing optimization scheme that operates through single label changes. Moreover, for these measures between partitions based on the relationship (joined or separated) of pairs of objects such as Rand index, we can update them incrementally for each label change, which makes sure the simulated annealing optimization scheme is computationally feasible. The simulation and real-life experiments then demonstrate that the proposed framework can achieve superior results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

انتخاب اعضای ترکیب در خوشه‌بندی ترکیبی با استفاده از رأی‌گیری

Clustering is the process of division of a dataset into subsets that are called clusters, so that objects within a cluster are similar to each other and different from objects of the other clusters. So far, a lot of algorithms in different approaches have been created for the clustering. An effective choice (can combine) two or more of these algorithms for solving the clustering problem. Ensemb...

متن کامل

Experiments on Comparing Graph Clusterings

A promising approach to compare graph clusterings is based on using measurements for calculating the distance. Existing measures either use the structure of clusterings or quality–based aspects. Each approach suffers from critical drawbacks. We introduce a new approach combining both aspects and leading to better results for comparing graph clusterings. An experimental evaluation of existing an...

متن کامل

Engineering Comparators for Graph Clusterings

A promising approach to compare two graph clusterings is based on using measurements for calculating the distance between them. Existing measures either use the structure of clusterings or quality-based aspects with respect to some index evaluating both clusterings. Each approach suffers from conceptional drawbacks. We introduce a new approach combining both aspects and leading to better result...

متن کامل

Comparing Clusterings by the Variation of Information

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering C to clustering C′. The criterion makes no assumptions about how the clusterings were generated and applies to both soft and hard clusterings....

متن کامل

Generation of Alternative Clusterings Using the CAMI Approach

Exploratory data analysis aims to discover and generate multiple views of the structure within a dataset. Conventional clustering techniques, however, are designed to only provide a single grouping or clustering of a dataset. In this paper, we introduce a novel algorithm called CAMI, that can uncover alternative clusterings from a dataset. CAMI takes a mathematically appealing approach, combini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008